A Finite State Network for Phonetic Text Processing
نویسنده
چکیده
In the past, phonetic transcriptions were made using a wide variety of fonts and formats, which hampered the development of phonetic text processing tools. Today, however, the increasing number of language documentation projects making their data freely available over the Web, combined with the adoption of the Unicode Standard by linguists as "best practice" character encoding, present linguistic software developers with an unprecedented opportunity to develop powerful tools for the analysis of phonetic text. This paper describes the generation of a finite state transducer that converts text represented in the International Phonetic Alphabet into phonetic feature sets.
منابع مشابه
Book Reviews: Statistical Methods for Speech Recognition
Current practitioners in the area of speech recognition who are familiar with the approach of Jelinek and others will find this a compact, concise, and useful overview of the state of the art in statistical approaches to speech recognition. Readers already familiar with Rabiner and Juang (1993) will find it an excellent companion volume. Computational linguists will also find this book to be en...
متن کاملEfficient Development of Lexical Language Resources and their Representation
Statistical approaches in speech technology, whether used for statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g., text corpora, pronunciation and morphology lexicons, and speech databases. This paper presents a system architecture for the rapid construction of morphologic and phonetic lexico...
متن کاملDesign and analysis of a German telephone speech database for phoneme based training
Based on the Sotscheck text corpus, we developped a new corpus that was specifically optimised for training phoneme-based recognition systems. Particular attention was payed on good coverage of phone transitions. Even though the resulting corpus is only slightly enlarged, it shows an increased phonetic coverage while maintaining a good phonetic balance. Results of phonetic statistical analysis ...
متن کاملThe OGI kids² speech corpus and recognizers
We describe a corpus of children’s speech, called the OGI Kids’ Speech corpus, and a speakerand vocabularyindependent recognition system trained and evaluated with these data. The corpus is composed of both prompted and spontaneous speech from 1100 children from kindergarten through grade 10. The prompted speech was presented as text appearing below an animated character (Baldi) that produced a...
متن کاملManipulation in advertising text: lexical and semantic aspect
The present paper focuses on the questions of modern advertising science, structure of advertising and elements making actual manipulative influence from the addresser. Advertising encourages product sales, is an instrument of forming ethical standards, values, creating cultural values, standards and mode of behavior that is why the wide system of means for achieving aims of advertisers is need...
متن کامل